Overview

Dataset statistics

Number of variables9
Number of observations418
Missing cells176
Missing cells (%)4.7%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory32.7 KiB
Average record size in memory80.0 B

Variable types

Categorical1
Numeric8

Alerts

area has a high cardinality: 416 distinct values High cardinality
single is highly correlated with married, no kids and 5 other fieldsHigh correlation
married, no kids is highly correlated with single and 5 other fieldsHigh correlation
not married, no kids is highly correlated with single and 5 other fieldsHigh correlation
married, with kids is highly correlated with single and 5 other fieldsHigh correlation
not married, with kids is highly correlated with single and 5 other fieldsHigh correlation
single parent is highly correlated with single and 5 other fieldsHigh correlation
other is highly correlated with single and 5 other fieldsHigh correlation
single is highly correlated with married, no kids and 3 other fieldsHigh correlation
married, no kids is highly correlated with single and 5 other fieldsHigh correlation
not married, no kids is highly correlated with single and 3 other fieldsHigh correlation
married, with kids is highly correlated with married, no kids and 3 other fieldsHigh correlation
not married, with kids is highly correlated with married, no kids and 4 other fieldsHigh correlation
single parent is highly correlated with single and 4 other fieldsHigh correlation
other is highly correlated with single and 5 other fieldsHigh correlation
single is highly correlated with not married, no kids and 1 other fieldsHigh correlation
married, no kids is highly correlated with married, with kids and 2 other fieldsHigh correlation
not married, no kids is highly correlated with single and 1 other fieldsHigh correlation
married, with kids is highly correlated with married, no kids and 2 other fieldsHigh correlation
not married, with kids is highly correlated with married, no kids and 2 other fieldsHigh correlation
single parent is highly correlated with married, no kids and 2 other fieldsHigh correlation
other is highly correlated with single and 1 other fieldsHigh correlation
single is highly correlated with married, no kids and 3 other fieldsHigh correlation
married, no kids is highly correlated with single and 4 other fieldsHigh correlation
not married, no kids is highly correlated with single and 4 other fieldsHigh correlation
married, with kids is highly correlated with married, no kids and 4 other fieldsHigh correlation
not married, with kids is highly correlated with married, no kids and 3 other fieldsHigh correlation
single parent is highly correlated with single and 5 other fieldsHigh correlation
other is highly correlated with single and 4 other fieldsHigh correlation
single has 5 (1.2%) missing values Missing
married, no kids has 17 (4.1%) missing values Missing
not married, no kids has 14 (3.3%) missing values Missing
married, with kids has 22 (5.3%) missing values Missing
not married, with kids has 25 (6.0%) missing values Missing
single parent has 21 (5.0%) missing values Missing
other has 40 (9.6%) missing values Missing
average woz value has 32 (7.7%) missing values Missing
area is uniformly distributed Uniform

Reproduction

Analysis started2023-04-12 20:54:17.562109
Analysis finished2023-04-12 20:54:28.598317
Duration11.04 seconds
Software versionpandas-profiling v3.1.0
Download configurationconfig.json

Variables

area
Categorical

HIGH CARDINALITY
UNIFORM

Distinct416
Distinct (%)99.5%
Missing0
Missing (%)0.0%
Memory size6.5 KiB
F11f Teleport
 
2
T92d Amstel III deel A/B Zuid
 
2
A01a Stationsplein e.o.
 
1
F78b Buurt 7
 
1
K49e Minervabuurt Midden
 
1
Other values (411)
411 

Length

Max length41
Median length21
Mean length21.3708134
Min length8

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique414 ?
Unique (%)99.0%

Sample

1st rowA00a Kop Zeedijk
2nd rowA00b Oude Kerk e.o.
3rd rowA00c Burgwallen Oost
4th rowA00d Nes e.o.
5th rowA00e BG-terrein e.o.

Common Values

ValueCountFrequency (%)
F11f Teleport2
 
0.5%
T92d Amstel III deel A/B Zuid2
 
0.5%
A01a Stationsplein e.o.1
 
0.2%
F78b Buurt 71
 
0.2%
K49e Minervabuurt Midden1
 
0.2%
B10b Alfa-driehoek1
 
0.2%
M27b Weesperzijde Midden/Zuid1
 
0.2%
N71f NDSM terrein1
 
0.2%
K47g Harmoniehofbuurt1
 
0.2%
A03d Amstelveldbuurt1
 
0.2%
Other values (406)406
97.1%

Length

2023-04-12T22:54:28.693571image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
zuid40
 
3.3%
noord40
 
3.3%
oost35
 
2.9%
west28
 
2.3%
e.o20
 
1.7%
midden16
 
1.3%
de14
 
1.2%
zuidoost12
 
1.0%
van11
 
0.9%
zuidwest10
 
0.8%
Other values (797)972
81.1%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

single
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct334
Distinct (%)80.9%
Missing5
Missing (%)1.2%
Infinite0
Infinite (%)0.0%
Mean587.2711864
Minimum1
Maximum2059
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.5 KiB
2023-04-12T22:54:28.818414image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile18
Q1240
median527
Q3857
95-th percentile1367
Maximum2059
Range2058
Interquartile range (IQR)617

Descriptive statistics

Standard deviation424.2675288
Coefficient of variation (CV)0.722438864
Kurtosis-0.01124621908
Mean587.2711864
Median Absolute Deviation (MAD)306
Skewness0.6877007086
Sum242543
Variance180002.936
MonotonicityNot monotonic
2023-04-12T22:54:28.939050image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
24
 
1.0%
14
 
1.0%
7054
 
1.0%
2053
 
0.7%
5693
 
0.7%
11623
 
0.7%
1993
 
0.7%
5193
 
0.7%
3463
 
0.7%
183
 
0.7%
Other values (324)380
90.9%
(Missing)5
 
1.2%
ValueCountFrequency (%)
14
1.0%
24
1.0%
32
0.5%
62
0.5%
71
 
0.2%
93
0.7%
122
0.5%
141
 
0.2%
151
 
0.2%
183
0.7%
ValueCountFrequency (%)
20591
0.2%
20131
0.2%
18761
0.2%
18261
0.2%
17341
0.2%
15981
0.2%
15591
0.2%
14991
0.2%
14901
0.2%
14581
0.2%

married, no kids
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct202
Distinct (%)50.4%
Missing17
Missing (%)4.1%
Infinite0
Infinite (%)0.0%
Mean99.57605985
Minimum1
Maximum381
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.5 KiB
2023-04-12T22:54:29.062721image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile6
Q145
median86
Q3142
95-th percentile242
Maximum381
Range380
Interquartile range (IQR)97

Descriptive statistics

Standard deviation73.83576251
Coefficient of variation (CV)0.7415011462
Kurtosis0.7664187582
Mean99.57605985
Median Absolute Deviation (MAD)47
Skewness0.9349869956
Sum39930
Variance5451.719825
MonotonicityNot monotonic
2023-04-12T22:54:29.201390image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
149
 
2.2%
638
 
1.9%
17
 
1.7%
716
 
1.4%
1235
 
1.2%
65
 
1.2%
1395
 
1.2%
1194
 
1.0%
44
 
1.0%
694
 
1.0%
Other values (192)344
82.3%
(Missing)17
 
4.1%
ValueCountFrequency (%)
17
1.7%
23
0.7%
34
1.0%
44
1.0%
52
 
0.5%
65
1.2%
73
0.7%
84
1.0%
91
 
0.2%
102
 
0.5%
ValueCountFrequency (%)
3811
0.2%
3621
0.2%
3261
0.2%
3241
0.2%
3161
0.2%
3111
0.2%
2961
0.2%
2911
0.2%
2901
0.2%
2861
0.2%

not married, no kids
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct228
Distinct (%)56.4%
Missing14
Missing (%)3.3%
Infinite0
Infinite (%)0.0%
Mean142.4009901
Minimum1
Maximum676
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.5 KiB
2023-04-12T22:54:29.331004image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile6
Q159
median106
Q3205
95-th percentile374.05
Maximum676
Range675
Interquartile range (IQR)146

Descriptive statistics

Standard deviation116.4075053
Coefficient of variation (CV)0.8174627524
Kurtosis1.889439571
Mean142.4009901
Median Absolute Deviation (MAD)62
Skewness1.296737181
Sum57530
Variance13550.70729
MonotonicityNot monotonic
2023-04-12T22:54:29.471627image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
747
 
1.7%
26
 
1.4%
16
 
1.4%
1245
 
1.2%
705
 
1.2%
625
 
1.2%
1425
 
1.2%
585
 
1.2%
1684
 
1.0%
54
 
1.0%
Other values (218)352
84.2%
(Missing)14
 
3.3%
ValueCountFrequency (%)
16
1.4%
26
1.4%
31
 
0.2%
42
 
0.5%
54
1.0%
64
1.0%
92
 
0.5%
103
0.7%
111
 
0.2%
121
 
0.2%
ValueCountFrequency (%)
6761
0.2%
6281
0.2%
5382
0.5%
4781
0.2%
4721
0.2%
4711
0.2%
4601
0.2%
4371
0.2%
4221
0.2%
4191
0.2%

married, with kids
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct211
Distinct (%)53.3%
Missing22
Missing (%)5.3%
Infinite0
Infinite (%)0.0%
Mean126.0833333
Minimum1
Maximum738
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.5 KiB
2023-04-12T22:54:29.616620image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile4
Q142
median92.5
Q3177
95-th percentile331.75
Maximum738
Range737
Interquartile range (IQR)135

Descriptive statistics

Standard deviation119.8907757
Coefficient of variation (CV)0.9508852006
Kurtosis4.82947465
Mean126.0833333
Median Absolute Deviation (MAD)59.5
Skewness1.883304214
Sum49929
Variance14373.7981
MonotonicityNot monotonic
2023-04-12T22:54:29.739329image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
18
 
1.9%
146
 
1.4%
1016
 
1.4%
746
 
1.4%
26
 
1.4%
725
 
1.2%
105
 
1.2%
1175
 
1.2%
815
 
1.2%
564
 
1.0%
Other values (201)340
81.3%
(Missing)22
 
5.3%
ValueCountFrequency (%)
18
1.9%
26
1.4%
34
1.0%
43
 
0.7%
54
1.0%
63
 
0.7%
72
 
0.5%
82
 
0.5%
93
 
0.7%
105
1.2%
ValueCountFrequency (%)
7381
0.2%
6551
0.2%
6531
0.2%
6171
0.2%
6051
0.2%
5701
0.2%
5481
0.2%
5361
0.2%
5251
0.2%
4371
0.2%

not married, with kids
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct132
Distinct (%)33.6%
Missing25
Missing (%)6.0%
Infinite0
Infinite (%)0.0%
Mean53.91348601
Minimum1
Maximum307
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.5 KiB
2023-04-12T22:54:29.871398image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile3
Q120
median48
Q374
95-th percentile133
Maximum307
Range306
Interquartile range (IQR)54

Descriptive statistics

Standard deviation42.88793711
Coefficient of variation (CV)0.7954955297
Kurtosis3.81527229
Mean53.91348601
Median Absolute Deviation (MAD)28
Skewness1.416189058
Sum21188
Variance1839.375149
MonotonicityNot monotonic
2023-04-12T22:54:30.002032image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
311
 
2.6%
1210
 
2.4%
18
 
1.9%
28
 
1.9%
687
 
1.7%
417
 
1.7%
667
 
1.7%
517
 
1.7%
197
 
1.7%
627
 
1.7%
Other values (122)314
75.1%
(Missing)25
 
6.0%
ValueCountFrequency (%)
18
1.9%
28
1.9%
311
2.6%
44
 
1.0%
54
 
1.0%
63
 
0.7%
74
 
1.0%
86
1.4%
93
 
0.7%
103
 
0.7%
ValueCountFrequency (%)
3071
0.2%
2341
0.2%
2301
0.2%
1851
0.2%
1711
0.2%
1681
0.2%
1671
0.2%
1591
0.2%
1571
0.2%
1531
0.2%

single parent
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct186
Distinct (%)46.9%
Missing21
Missing (%)5.0%
Infinite0
Infinite (%)0.0%
Mean96.40806045
Minimum1
Maximum537
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.5 KiB
2023-04-12T22:54:30.127683image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile3
Q126
median82
Q3141
95-th percentile266.8
Maximum537
Range536
Interquartile range (IQR)115

Descriptive statistics

Standard deviation84.39292861
Coefficient of variation (CV)0.8753721236
Kurtosis2.410663185
Mean96.40806045
Median Absolute Deviation (MAD)58
Skewness1.319952839
Sum38274
Variance7122.1664
MonotonicityNot monotonic
2023-04-12T22:54:30.263322image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
49
 
2.2%
208
 
1.9%
38
 
1.9%
27
 
1.7%
117
 
1.7%
67
 
1.7%
17
 
1.7%
426
 
1.4%
196
 
1.4%
515
 
1.2%
Other values (176)327
78.2%
(Missing)21
 
5.0%
ValueCountFrequency (%)
17
1.7%
27
1.7%
38
1.9%
49
2.2%
54
1.0%
67
1.7%
72
 
0.5%
82
 
0.5%
92
 
0.5%
103
 
0.7%
ValueCountFrequency (%)
5371
0.2%
4411
0.2%
3641
0.2%
3551
0.2%
3491
0.2%
3301
0.2%
3281
0.2%
3171
0.2%
3121
0.2%
3081
0.2%

other
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct45
Distinct (%)11.9%
Missing40
Missing (%)9.6%
Infinite0
Infinite (%)0.0%
Mean13.55291005
Minimum1
Maximum71
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.5 KiB
2023-04-12T22:54:30.550830image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q15
median11.5
Q319
95-th percentile35.15
Maximum71
Range70
Interquartile range (IQR)14

Descriptive statistics

Standard deviation11.0181237
Coefficient of variation (CV)0.8129710635
Kurtosis2.488580908
Mean13.55291005
Median Absolute Deviation (MAD)6.5
Skewness1.385380367
Sum5123
Variance121.3990499
MonotonicityNot monotonic
2023-04-12T22:54:30.669541image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=45)
ValueCountFrequency (%)
228
 
6.7%
621
 
5.0%
421
 
5.0%
120
 
4.8%
1218
 
4.3%
718
 
4.3%
318
 
4.3%
916
 
3.8%
1315
 
3.6%
1414
 
3.3%
Other values (35)189
45.2%
(Missing)40
 
9.6%
ValueCountFrequency (%)
120
4.8%
228
6.7%
318
4.3%
421
5.0%
513
3.1%
621
5.0%
718
4.3%
813
3.1%
916
3.8%
109
 
2.2%
ValueCountFrequency (%)
711
 
0.2%
502
0.5%
473
0.7%
452
0.5%
432
0.5%
422
0.5%
401
 
0.2%
393
0.7%
381
 
0.2%
371
 
0.2%

average woz value
Real number (ℝ≥0)

MISSING

Distinct386
Distinct (%)100.0%
Missing32
Missing (%)7.7%
Infinite0
Infinite (%)0.0%
Mean465144.1114
Minimum22371
Maximum2293760
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.5 KiB
2023-04-12T22:54:30.805191image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum22371
5-th percentile220912.25
Q1325552.5
median418892
Q3527094.25
95-th percentile882381.75
Maximum2293760
Range2271389
Interquartile range (IQR)201541.75

Descriptive statistics

Standard deviation237377.0592
Coefficient of variation (CV)0.5103301394
Kurtosis11.76381048
Mean465144.1114
Median Absolute Deviation (MAD)101382
Skewness2.511763087
Sum179545627
Variance5.634786823 × 1010
MonotonicityNot monotonic
2023-04-12T22:54:30.934807image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
4447381
 
0.2%
4265801
 
0.2%
3285661
 
0.2%
4849061
 
0.2%
3326201
 
0.2%
5138781
 
0.2%
2943821
 
0.2%
3236711
 
0.2%
2398751
 
0.2%
5272811
 
0.2%
Other values (376)376
90.0%
(Missing)32
 
7.7%
ValueCountFrequency (%)
223711
0.2%
265861
0.2%
484861
0.2%
752331
0.2%
1030461
0.2%
1038271
0.2%
1207531
0.2%
1319321
0.2%
1664751
0.2%
1706791
0.2%
ValueCountFrequency (%)
22937601
0.2%
15949021
0.2%
14018991
0.2%
13737231
0.2%
13195861
0.2%
13165771
0.2%
12184801
0.2%
11986551
0.2%
11911261
0.2%
10919471
0.2%

Interactions

2023-04-12T22:54:27.017318image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2023-04-12T22:54:20.095195image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2023-04-12T22:54:20.963709image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2023-04-12T22:54:22.199636image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2023-04-12T22:54:23.175028image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2023-04-12T22:54:24.128135image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2023-04-12T22:54:25.026659image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2023-04-12T22:54:25.916857image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2023-04-12T22:54:27.115059image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2023-04-12T22:54:20.208893image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2023-04-12T22:54:21.069470image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2023-04-12T22:54:22.307350image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2023-04-12T22:54:23.282741image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2023-04-12T22:54:24.231867image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2023-04-12T22:54:25.122415image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2023-04-12T22:54:26.021577image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2023-04-12T22:54:27.233741image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2023-04-12T22:54:20.326380image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2023-04-12T22:54:21.198109image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2023-04-12T22:54:22.432055image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2023-04-12T22:54:23.410014image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2023-04-12T22:54:24.351498image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2023-04-12T22:54:25.237092image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2023-04-12T22:54:26.141270image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2023-04-12T22:54:27.349468image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2023-04-12T22:54:20.443109image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2023-04-12T22:54:21.327774image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2023-04-12T22:54:22.546746image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2023-04-12T22:54:23.536678image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2023-04-12T22:54:24.470764image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2023-04-12T22:54:25.355776image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2023-04-12T22:54:26.266934image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2023-04-12T22:54:27.464175image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2023-04-12T22:54:20.549348image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2023-04-12T22:54:21.447427image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2023-04-12T22:54:22.667387image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2023-04-12T22:54:23.670357image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2023-04-12T22:54:24.588495image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2023-04-12T22:54:25.468477image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2023-04-12T22:54:26.396595image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2023-04-12T22:54:27.572423image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2023-04-12T22:54:20.653580image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2023-04-12T22:54:21.580652image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2023-04-12T22:54:22.782078image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2023-04-12T22:54:23.793989image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2023-04-12T22:54:24.698240image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2023-04-12T22:54:25.582210image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2023-04-12T22:54:26.650301image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2023-04-12T22:54:27.673751image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2023-04-12T22:54:20.753297image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2023-04-12T22:54:21.695349image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2023-04-12T22:54:22.935707image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2023-04-12T22:54:23.909718image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2023-04-12T22:54:24.813895image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2023-04-12T22:54:25.684515image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2023-04-12T22:54:26.781949image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2023-04-12T22:54:27.787492image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2023-04-12T22:54:20.865011image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2023-04-12T22:54:21.816068image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2023-04-12T22:54:23.064325image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2023-04-12T22:54:24.020381image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2023-04-12T22:54:24.920977image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2023-04-12T22:54:25.806154image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2023-04-12T22:54:26.909607image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Correlations

2023-04-12T22:54:31.048550image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2023-04-12T22:54:31.212114image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2023-04-12T22:54:31.368647image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2023-04-12T22:54:31.527269image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2023-04-12T22:54:27.988955image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
A simple visualization of nullity by column.
2023-04-12T22:54:28.181962image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-04-12T22:54:28.353370image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2023-04-12T22:54:28.516397image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

areasinglemarried, no kidsnot married, no kidsmarried, with kidsnot married, with kidssingle parentotheraverage woz value
0A00a Kop Zeedijk543.037.0149.014.012.022.012.0432583.0
1A00b Oude Kerk e.o.331.020.0104.08.03.020.012.0475037.0
2A00c Burgwallen Oost676.055.0231.030.023.043.026.0469793.0
3A00d Nes e.o.130.016.059.05.08.011.05.0605863.0
4A00e BG-terrein e.o.348.024.073.019.09.014.06.0556369.0
5A01a Stationsplein e.o.1.0NaN1.01.0NaNNaNNaNNaN
6A01b Hemelrijk251.014.093.06.08.011.07.0464846.0
7A01c Nieuwendijk Noord203.016.069.07.04.010.06.0447814.0
8A01d Spuistraat Noord328.042.0128.016.05.015.07.0506697.0
9A01e Nieuwe Kerk e.o.289.033.0124.015.014.011.012.0459527.0

Last rows

areasinglemarried, no kidsnot married, no kidsmarried, with kidsnot married, with kidssingle parentotheraverage woz value
408T96c Holendrecht Oost413.0266.070.0185.083.0150.010.0323513.0
409T96d Gaasperdam Noord541.076.088.079.055.0170.017.0251964.0
410T96e Gaasperdam Zuid614.070.068.036.044.0119.011.0221128.0
411T96f Reigersbos Midden670.0119.080.0139.070.0215.012.0243072.0
412T96g Reigersbos Zuid522.0191.072.0188.074.0158.015.0308590.0
413T97a Gein Noordwest977.071.0142.088.063.0300.022.0211475.0
414T97b Gein Zuidwest421.041.070.072.051.0186.010.0221090.0
415T97c Gein Noordoost380.0362.091.0325.0107.0125.014.0332620.0
416T97d Gein Zuidoost718.0242.097.0220.0104.0274.017.0280695.0
417T98a Dorp Driemond193.0144.042.0139.060.068.0NaN351711.0